Squeeze: Efficient compact fractals for tensor core GPUs
نویسندگان
چکیده
This work presents Squeeze, an efficient compact fractal processing scheme for tensor core GPUs. By combining discrete-space transformations between and expanded forms, one can do data-parallel computation on a with neighborhood access without needing to expand the in memory. The space are formulated as two GPU tensor-core accelerated thread maps, ?(?) ?(?), which act compact-to-expanded expanded-to-compact functions, respectively. cost of maps is O(log2logs(n)) time, n being side n×n embedding its form, s linear scaling factor. proposed approach works any that belongs Non-overlapping-Bounding-Boxes (NBB) class discrete fractals, be extended three dimensions well. Experimental results using Sierpinski Triangle case study shows up ?12× speedup memory reduction factor ?315× respect GPU-based expanded-space bounding box approach. These show will allow scientific community efficiently tackle problems now could not fit into
منابع مشابه
Fractals Image Rendering and Compression using GPUs
Fractal image compression provides immense advantages as compared to conventional image compressions. Though the fractal image encoding time is comparatively quite high as compared to the conventional ones but the decoding time is far less and almost instantaneous. Besides, fractal images are resolution-independent, implying that these images will render the same intensity and quality even when...
متن کاملHigh-Performance Tensor Contractions for GPUs
We present a computational framework for high-performance tensor contractions on GPUs. High-performance is difficult to obtain using existing libraries, especially for many independent contractions where each contraction is very small, e.g., sub-vector/warp in size. However, using our framework to batch contractions plus application-specifics, we demonstrate close to peak performance results. I...
متن کاملEfficient softmax approximation for GPUs
We propose an approximate strategy to efficiently train neural network based language models over very large vocabularies. Our approach, called adaptive softmax, circumvents the linear dependency on the vocabulary size by exploiting the unbalanced word distribution to form clusters that explicitly minimize the expectation of computational complexity. Our approach further reduces the computation...
متن کاملEfficient Synchronization Primitives for GPUs
In this paper, we revisit the design of synchronization primitives— specifically barriers, mutexes, and semaphores—and how they apply to the GPU. Previous implementations are insufficient due to the discrepancies in hardware and programming model of the GPU and CPU. We create new implementations in CUDA and analyze the performance of spinning on the GPU, as well as a method of sleeping on the G...
متن کاملEfficient Parallel RSA Decryption Algorithm for Many-core GPUs with CUDA
Cryptography is an important technique among various applications. In the telecommunication, cryptography is necessary when an untrusted medium is communicated in the network. RSA is a public-key cryptography algorithm to use a pair (N, E) as the public key and D as the private key. The N is the product of two large prime numbers p and q that are kept secret. It is very hard and no known polyno...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Future Generation Computer Systems
سال: 2022
ISSN: ['0167-739X', '1872-7115']
DOI: https://doi.org/10.1016/j.future.2022.04.023